This dataset is a financial time series of Bund futures, i.e. the futures contract on the 10yr bond issued by the German government. A futures contract is the commitment to buy or sell some given underlying product at a certain date.
The dataset starts on 12th May 2016 and ends on 29th Sep 2017. It is a subset of a proprietary dataset belonging to the independent quantitative trading research project FQT Research, whose owners kindly agreed to be used for the scope of this project. It contains a few missing dates due to some server-side connection issues in some occasional circumstance.
Each row is a snapshot taken every 5 minutes of:
Some notes for the sake of clarification:
The dataset
## Summary of the entire dataset:
## bid ask bidv askv
## Min. :158.8 Min. :158.8 Min. : 0 Min. : 1
## 1st Qu.:161.6 1st Qu.:161.6 1st Qu.: 82751 1st Qu.: 83774
## Median :163.2 Median :163.2 Median :178627 Median :179510
## Mean :163.0 Mean :163.0 Mean :187335 Mean :187771
## 3rd Qu.:164.3 3rd Qu.:164.3 3rd Qu.:275860 3rd Qu.:276468
## Max. :168.5 Max. :168.5 Max. :656814 Max. :641295
## hundreds_bidv hundreds_askv
## Min. : 0 Min. : 0
## 1st Qu.: 23402 1st Qu.: 23887
## Median : 53558 Median : 53812
## Mean : 58247 Mean : 58438
## 3rd Qu.: 87036 3rd Qu.: 88012
## Max. :231872 Max. :215364
About this particular dataset, it is key to understand that, for the sake of our analysis, bid and ask are uniquely used to compute the mid price by averaging them. Then, we are not even so much interested in the mid price itself, but rather in its first derivative with respect to a change in time, which we compute below. The reason is that we are more interested in the change in price rather than in the price itself.
All other values are cumulative volume values. Therefore, they represent a snapshot of the cumulative value at each time of the day. For the limited scope of this specific, we have chosen to not investigate the differences between bid volume and ask volume. We have chosen to just focus on the overall volume by simply summing the two. Again, we are not interested in the volume data per se but rather in its first derivative with respect to a change in time.
We have chosen limit the focus of our analysis to the following three key features:
More detail on their computation will be provided below.
## Summary of changes in price over 5-min intervals (in absolute value):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.01 0.02 0.02 0.03 0.95
In order to measure the volatility of the price, we computed the differentials between the price at time t and the price a time t + 5 minutes. Then we took the absolute value of the differentials we cause in this case we are more interested in how much the price moved rather than its direction. We can see that the range is very wide, ranging from 0 to 0.95.
## Summary of changes in price over 5-min intervals (in absolute value)
## below 99th centile:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.01000 0.02000 0.02286 0.03000 0.11500
This is the same summary but here, for the sake of better insight of what happens in the usual case, we removed all data points beyond the 99th centile. From here we can see that in 99% of the cases the price oscillation is between 0 and 0.115. This implies that occasionally there are some exceptionally large outliers between 0.115 and 0.95.
## Summary of changes in total volume over 5-min intervals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 837 2998 3674 5184 79099
This summary gives an idea of the 5-min liquidity. By liquidity we mean the number of contracts traded among market participants. We can see that again the range of values is quite wide and goes from 0 to 79099.
## Summary of changes in total volume over 5-min intervals
## below 99th centile:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 816.8 2961.0 3476.9 5097.0 16421.0
As we did for the volatility summary, again for the sake of better insight of what happens in the usual case, we removed all data points beyond the 99th centile. From here we can see that in 99% of the cases the number of traded contracts is between 0 and 16421. This implies that occasionally there are some exceptionally large outliers, like we saw when we measured the volatility.
## Summary of changes in total volume greater than 99 contracts
## over 5-min intervals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 100 732 1135 1642 23489
Here we are measuring the number of traded contracts whose minimum trade size was greater than 99. It is worth to point out at each trade has a trade size and a market participant can choose to trade either one contract at time or many contracts at once. Only large traders such as banks, hedge funds or exceptionally wealthy individuals have the capacity to trade more than one hundred contracts at once. Therefore, this reveals the presence of large traders. However, it is not sufficient to reveal the extent of their presence, as nothing prevents these traders to trade small quantities as well. Nonetheless, it is a measure of the lower bound of such extent.
## Summary of changes in total volume greater than 99 contracts
## over 5-min intervals, below 99th centile:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 100 716 1058 1604 6408
Same as above, again for the sake of better insight of what happens in the usual case, we removed all data points beyond the 99th centile. From here we can see that in 99% of the cases the number of traded contracts is between 0 and 6408 This implies that occasionally there are some exceptionally large outliers, like we saw when analyzing previous variables as well.
This plot shows the distribution of the liquidity over 5-min intervals. We can see that the mode is in the first bin, between 0 and 200. This is likely to be due to the fact that many of these 5-min interval happen at times when the market is open but there is little activity. The vertical line in the middle represents the median of the distribution, while the dashed lines represent respectively the 10th, 25th, 75th and 90th centile.
This plot shows the distribution of the subset of the liquidity where the trade size was larger than 99 contracts over 5-min intervals. It may appear reasonable for the data to have this shape because - as we pointed out earlier - these are the trades made by large traders. Since it takes a lot of financial capacity to make these trades, it appears reasonable that the most frequent observations are the lowest and the least frequent ones are the highest. Like the previous plot, the vertical line in the middle represents the median of the distribution, while the dashed lines represent respectively the 10th, 25th, 75th and 90th centile.
This plot shows the distribution of price movements over 5-minute intervals. In this case, we are not looking at the asbolute value because we interested in seeing there is any skew in the distribution. From this plot, we can see that the distribution appears quite symmetrical to the naked eye, so that it seems appropriate to infer that there are approximately as many positive and negative prices changes. This also appears to legimitize the choice of looking at the absolute value of changes in price, since taking the sign away does not appear to cause a substantial loss of information.
## 'data.frame': 51787 obs. of 9 variables:
## $ bid : num 164 164 164 164 164 ...
## $ ask : num 164 164 164 164 164 ...
## $ bidv : num 0 435 1318 2737 3481 ...
## $ askv : num 433 1145 1690 2523 3156 ...
## $ hundreds_bidv : num 0 0 109 239 239 470 470 569 569 669 ...
## $ hundreds_askv : num 433 543 543 543 543 ...
## $ mid : num 164 164 164 164 164 ...
## $ totalv : num 433 1580 3008 5260 6637 ...
## $ hundreds_totalv: num 433 543 652 782 782 ...
## 'data.frame': 51475 obs. of 3 variables:
## $ mid : num 0.015 -0.03 -0.02 0.01 0.01 ...
## $ totalv : num 1147 1428 2252 1377 881 ...
## $ hundreds_totalv: num 110 109 130 0 231 120 199 114 227 0 ...
There are 51787 timestamps in the dataset with 9 features. We then created a diffs dataframe containing the differentials of price and volume, as well as the time of the day of day of the week. This will enable us to group the data points by weekday and time of day and do some deeper investigation. Additional details about the single variables can be found below each plot.
The main features of interest are the price differentials and the volume. The price differentials matter in that they provide an idea of the volatility of the price. The volume matters in that it provides an idea of how liquid the market is. Both are important when it comes to taking trading decisions.
Day of week and time of day are very important because there are some time spans over the day when the market is more liquid and volatile due to the fact that exchanges other than EUREX are open at certain specific times. This is quite likely to have an impact on how the price moves.
Yes, I created the mid price variable, which is the average of the bid price and the ask price. As we pointed out earlier, brokers very often keep a spread between the price offered to buyers and the price offered to sellers. In the overwhelming majority of the cases, in the case of Euro-Bund futures this spread is .01. The mid price isn’t actually available to the public but it is a reasonable to measure price changes.
Yes, the distribution of the volume (totalv) looks quite unsual. The first bin from the left is the mode of the distribution, then the count slowly decreases to one first valley around the 10th, it does a local peak around the median and then progressively decreases until it fades. It might be interesting to explore the shape of the distribution across the hour of the day. Without seeing the distribution by hour of day, one might suspect that all those many points contituting the mode of the distribution happen in those hours when London and other European markets are closed.
This plot aims at verifying the hypothesis we considered in the paragraph above. It shows the distribution of volume (totalv) by hour of day. The vertical line in the middle represents the median of the distribution, while the dashed lines represent respectively the 10th, 25th, 75th and 90th centile. It is important to clarify that the centiles refer to the plot of the overall unsplit distribution. Its purpose is to make how the distribution of each hour is similar or different than the general one. Indeed this plot confirms the hypothesis we confirmed in the last paragraph: after 18, the distribution becomes very positively skewed and it such a way that it affects the overall distribution.
This scatterplot shows the liquidity by hour of day. The dashed lines represent respectively the 10th and the 90th centile, while the line in the middle represent the median by hour. From here we can see a reasonable pattern:
This plot represents the price changes every 5 minutes in absolute terms, i.e. irrespective of their direction. It measures the volatility of the price over 5-minute intervals.
We can see a few spikes:
We can also see a clear decrease in volatility after 17:30, when the London Stock Exchange and most European markets close.
Given price changes are expressed in absolute terms, of course there is a lower bound at zero. The median is at 0.02 most of the time, with a few spikes at 0.03 and 0.04 in the time ranges mentioned above and then it goes to 0.01 after London and other European markets close. It is interesting to notice approximately 90% of the price moves are below 0.07, yet approximately 1 in 10 moves is above that threshold, with some very significant outliers not appearing in this plot only for the sake of better readability.
## Correlation between liquidity and volatility:
## 0.5269955
We may argue there might be a slight correlation between liquidity (totalv) and volatility (abs(mid)), as we can see that when liquidity becomes high it is rarer too observe low volatility and when volatility becomes high it is rarer too observe low liquidity. As a result, the shape of our jitter plot has a diagonal tendency, but we may arguably find it a little bit too foggy to confidently believe in a reliable correlation as there are too many cases liquidity is relatively high and volatility is relatively low and viceversa.
## # A tibble: 5 x 4
## weekday absmid_median absmid_75th_centile absmid_90th_centile
## <fctr> <dbl> <dbl> <dbl>
## 1 Monday 0.015 0.03 0.05
## 2 Tuesday 0.020 0.03 0.05
## 3 Wednesday 0.020 0.03 0.05
## 4 Thursday 0.020 0.04 0.06
## 5 Friday 0.020 0.03 0.06
This boxplot shows the 5-min volatility by weekday:
It is interesting to notice that Friday’s 90th centile is at .06 like Thursday’s while its 75th is at .03 like Monday to Wednesday’s. This may suggest lower volatility in the more usual case but higher in the lower part of the extremes.
It is also interesting to notice 1 in 100 observations is above .12 and can go as far as .95, which is a quite wild move for a 5-minute interval. This means that - on average - every hundred 5-min intervals, we have one swing ranging between .12 and .95. One in hundred 5-min intervals means one every 500 minutes, i.e. one every 8.33 hours approximately. Given the EUREX Exchange is open 14 hours per day (from 8 to 22), this suggests we may expect these moves 1.7 times per day. This might arguably appear to be too frequent to be ignored.
This plot represents only those points whose absolute mid value was above the 99th centile, i.e. all those points we did not consider in the boxplot showing the 5-min volatility by weekday. As much as we consider that choice legitimate for the sake of its specific purpose, it appears worth noting - as we suspected above and as it is confirmed by this plot - these points appear too frequent to be ignored. By watching this plot to the naked eye, one might be tempted to set the threshold above .25 and below -.25 in order to consider these points true outliers.
Yet, for the sake of keeping correct perspective, it is also useful to keep in mind that the bulk of observations is within -.1 and .1, as it is shown by this plot of all observations with an alpha of .1.
We have uncovered a few time-related patterns that we discussed into greater depth below each single plot for the sake of greater immediate clarity.
No, not really.
We did not find any particular strong relationship, but rather uncovered a few time-related patterns.
## # A tibble: 5 x 5
## weekday totalv_25th_centile totalv_median totalv_75th_centile
## <fctr> <dbl> <dbl> <dbl>
## 1 Monday 602.00 2463 4284.75
## 2 Tuesday 876.00 3098 5245.00
## 3 Wednesday 1044.00 3259 5521.00
## 4 Thursday 945.00 3414 5790.00
## 5 Friday 752.25 2882 4960.50
## # ... with 1 more variables: totalv_90th_centile <dbl>
## geom_path: Each group consists of only one observation. Do you need to
## adjust the group aesthetic?
This plot shows that we have an upward trend in liquidity from Monday to Thursday and then it subsides on Friday. The median volume of each day is respectively 2463, 3098, 3259, 3414, 2882.
This plot represents the price changes every 5 minutes in absolute terms, i.e. irrespective of their direction, by direction of price movement, ‘up’ when the change in price is positive and ‘down’ when it is negative. The behavior appears to be quite similar. There are a few differences here and there but they appear to be negligible. This seems to confirm the symmetrical distribution of price changes that we observed earlier.
This is the same plot as above but with an additional split by day of week. We can see that here and there it is possible to notice a few differences on a day-by-day, yet it is not very intuitive to spot them.
## Summary of median change in price by hour of day and day of week:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.01000 0.02000 0.01897 0.02000 0.05000
This 3D surface aims at uncovering patterns in a more intuitive way. It shows the median volatility of the price as a function of day of week and time of day. It is red when volatility is high (or metaphorically hot) and blue when it is low (or metaphorically cold).
This heatmap aims at showing the same as above, but it seems to us that it shows outcome in a even more visually clear way.
## Summary of median change in volume by hour of day and day of week:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 76.0 815.6 3574.2 3135.9 4755.1 9163.0
This 3D surface shows the median liquidity a function of day of week and time of day. Again, it is red when volatility is high (or metaphorically hot) and blue when it is low (or metaphorically cold).
This heatmap aims at showing the same as above, but it seems to us that it shows outcome in a even more visually clear way.
## Summary of median proportion of large trades vs overall volume
## by hour of day and day of week:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1082 0.2812 0.2133 0.3150 0.3847
We saw that usually there is very little liquidity before 9:00 and after 17:30, relatively high liquidity in the morning from 9:00 to 12:00 and highest liquidity between 14:30 and 17:30, when both European and US markets are open and active.
What appeared quite interesting to us is the fact that between 8:00 and 9:00 the volatility is high and the liquidity is low. This implies it might be risky to trade in that environment because if a trade goes against the direction of the trader, given the low liquidity it might be harder to find a counterpart at the desired exit price, thus causing to potentially exit a trade at a bad price.
This heatmap shows the median volatility of the price as a function of day of week and time of day. It is red when volatility is high (or metaphorically hot) and blue when it is low (or metaphorically cold). It is precious to know the usual level of volatility for a trader because some volatility is useful to find profitable trading opportunities, yet too much of it may cause unexpected large losses, which might take a relatively long time to be recovered. We can see there are some moments when there are extreme peaks that one might choose to either ride or avoid, based on their trading models’ ability to detect an accurate signal. In particular, we can notice some volatility peaks when the week begins on Monday morning at 8, on Tuesdays at 9 and on Fridays around 14:30.
This heatmap shows the median liquidity as a function of day of week and time of day. It is red when liquidity is high (or metaphorically hot) and blue when it is low (or metaphorically cold). It is precious to know the usual level of liquidity for a trader because it is needed in order to find a counterpart at the desired exit price. From this plot we can see that there is very little liquidity before 8:00, barely any after 18:00, and some extreme peaks on Wednesdays around 11:30, on Fridays around 14:30 and from Tuesday to Thursday around 17:15.
This heatmap shows the median proportion of liquidity due to trades larger than 100 contracts over total liquidity, as a function of day of week and time of day. It is red when it is high (or metaphorically hot) and blue when it is low (or metaphorically cold). It is precious to know the usual level of such proportion for a trader because it shows the presence of large players in the market over some time span. A trader might suspect that their model is more robust if it usually trades in those times when large players are in the game.
The thing that required most attention was clarifying what I wanted to do. I noticed that until I have not well clarified what I want to visualize, any attempt is a struggle. On the other hand, after careful reflection of what it is that we want to see, it becomes much easier to obtain it. In particular, the heatmaps required some thought in order to clarify that for each (time, weekday) couple we needed to reduce all the points their median and use the long or wide form based on whether we were using plotly’s 3D surface or ggplot2’s heatmap.
What was surprising to me what how clearly the patterns of liquidity and volatility emerged and how clear this difference is when some markets are open or not. In particular, we have seen that despite the fact that the EUREX Exchange is open between 8:00 and 22:00, most of the activity in terms of liquidity and volatility happens between 9:00 and 17:30, i.e. the times when the London Stock Exchange is open. It was interesting to notice that activity fades after 17:30 despite the fact that the New York Stock Exchange is open until 22:00 CET.
Further exploration of the data might include doing the same heatmaps only between 9:00 and 17:30, i.e. only when the London Stock Exchange and most other European exchanges are open, under the assumption that the ‘real’ trading day is between 9:00 and 17:30. Therefore, it may be interesting to re-assess what is high, medium or low level of liquidity, volatility or other features in that subset of the data which is constituted by what we have arguably named the ‘real’ trading day. Moreover, a trader might want to do a heatmap of the performance of their model as a function of time of day and day of week and compare it to the heatmaps of liquidity, volatility and other features, in order to obtain a visual intuition of whether its model performs better or worse when the features have take a certain value range.